21 research outputs found

    Towards Web-based Biometric Systems Using Personal Browsing Interests

    Get PDF
    International audienceWe investigate the potential to use browsing habits and browser history as a new authentication and identification system for the Web with potential applications to anomaly and fraud detection. For the first time, we provide an empirical analysis using data from 4,5784,578 users. We employ the traditional biometric analysis and show that the False Acceptance Rate can be low (FAR=1.1%FAR=1.1\%), though this results in a relatively high False Rejection Rate (FRR=13.8%FRR=13.8\%). The scheme may either be utilized by Web service providers (with access to user's browser history) or any Webmaster, using other specialized techniques such as timing-based browser cache sniffing or a browser extension. We construct such a proof-of-concept extension

    On the Uniqueness of Web Browsing History Patterns

    Get PDF
    International audienceWe present the results of the first large-scale study of the uniqueness of Web browsing histories, gathered from a total of 368,284368,284 Internet users who visited a history detection demonstration website. Our results show that for a majority of users (69%69\%), the browsing history is unique and that users for whom we could detect at least 44 visited websites were uniquely identified by their histories in 97%97\% of cases. We observe a significant rate of stability in browser history fingerprints: for repeat visitors, 38%38\% of fingerprints are identical over time, and differing ones were correlated with original history contents, indicating static browsing preferences (for history subvectors of size 5050). We report a striking result that it is enough to test for a small number of pages in order to both enumerate users' interests and perform an efficient and unique behavioral fingerprint; we show that testing 5050 web pages is enough to fingerprint 42%42\% of users in our database, increasing to 70%70\% with 500500 web pages

    Selling Off Privacy at Auction

    Get PDF
    Real-Time Bidding (RTB) and Cookie Matching (CM) are transforming the advertising landscape to an extremely dynamic market and make targeted advertising considerably permissive. The emergence of these technologies allows companies to exchange user data as a product and therefore raises important concerns from privacy perspectives. In this paper, we perform a privacy analysis of CM and RTB and quantify the leakage of users' browsing histories due to these mechanisms. We study this problem on a corpus of users' Web histories, and show that using these technologies, certain companies can significantly improve their tracking and profiling capabilities. We detect 4141 companies serving ads via RTB and over 125125 using Cookie Matching. We show that 91%91\% of users in our dataset were affected by CM and in certain cases, 27%27\% of users' Web browsing histories could be leaked to 3rd-party companies through RTB. We expose a design characteristic of RTB systems to observe the prices which advertisers pay for serving ads to Web users. We leverage this feature and provide important insights into these prices by analyzing different user profiles and visiting contexts. Our study shows the variation of prices according to context information including visiting site, time and user's physical location. We experimentally confirm that users with known Web browsing history are evaluated higher than new comers, that some user profiles are more valuable than others, and that users' intents, such as looking for a commercial product, are sold at higher prices than users' Web browsing histories. In addition, we show that there is a huge gap between users' perception of the value of their personal information and its actual value on the market. A recent study by Carrascal et al. showed that, on average, users evaluate the price of the disclosure of their presence on a Web site to EUR 7. We show that user's Web browsing history elements are routinely being sold off for less than $0.0005\$0.0005

    The leaking battery: A privacy analysis of the HTML5 Battery Status API

    Get PDF
    We highlight the privacy risks associated with the HTML5 Battery Status API. We put special focus on its implementation in the Firefox browser. Our study shows that websites can discover the capacity of users’ batteries by exploiting the high precision readouts provided by Firefox on Linux. The capacity of the battery, as well as its level, expose a fingerprintable surface that can be used to track web users in short time intervals. Our analysis shows that the risk is much higher for old or used batteries with reduced capacities, as the battery capacity may potentially serve as a tracking identifier. The fingerprintable surface of the API could be drastically reduced without any loss in the API’s functionality by reducing the precision of the readings. We propose minor modifications to Battery Status API and its implementation in the Firefox browser to address the privacy issues presented in the study. Our bug report for Firefox was accepted and a fix is deployed

    Analyser les risques sur la vie privée et l'économie du profilage WEB

    No full text
    Les nouvelles technologies introduisent de nouveaux problèmes et risques. Par exemple, les internautes sont constamment tracés et profilés sur l'Internet. Ce profilage permet aux divers sites de personnaliser et ainsi d'améliorer le service qu'ils fournissent à chaque internaute. Cependant ce profilage introduit aussi des problèmes d'intimité et de protection de la vie privée. Il est d'ailleurs reconnu que ces données personnelles sont souvent échangées, voire vendues, et qu'il existe une vraie économie des données personnelles. Cette thèse étudie comment ces données personnelles, et en particulier les historique Web - c'est à dire la liste des sites Internet visités par un internaute-, sont collectées, échangées et vendues. Elle propose une analyse de la vie privée des systèmes de vente aux enchères des publicités ciblés. Elle montre comment les différents acteurs de la publicité en ligne collectent et s'échangent les données personnelles, et étudie les risques pour les Internautes. Elle propose également une analyse économique et montre, notamment, que les données sont bradées pour quelques millièmes de dollars.New medias introduce new problems and risks. There are important security and privacy considerations related to online interactions. Users browsing the Web leave a constant trail of traces referring to their Web actions. A large number of entities take advantage of this data to constantly improve how the Web services function, often offering rich personalization capabilities -- to achieve this, user data is needed. To obtain user data, Web users are being tracked and profiled. Having user data may help enhancing functionality and usability, but it also has the potential of introducing complex privacy problems, related to data collection, storing and processing. The incentives to gather user data are of economical nature: user data is monetized. We start with a description of privacy problems and risks, highlighting their roots in technology changes; users must constantly struggle to adapt to changes. The legal frameworks relating to privacy are about to change: Web companies will have to adopt to new realities. First part of this thesis is devoted to measuring the consequences of private data leaks and tracking. We show how Web browsing history convey insight relating to user interests. We study the risks of Web browsing history leaks. We point out that browsing history is to large extent unique; we perform this basing on a dataset of more than 350k partial history fingerprints. The consequence here is that if browsing histories are personally identifiable information (PII), the upcoming European privacy legal frameworks could potentially result in strict guidelines for their collection, storing and processing. The tracking measurement of third-party resources confirms the popular notion that most of the tracking is carried by US-based companies. This creates interesting information asymmetries, which are of great importance, especially if user data could be simply equated to financial and economical benefits. Second part discusses value of privacy. We study the emerging technology of Real-Time Bidding (RTB), online real-time auctions of ad spaces. We highlight that during the auction phase, bidders in RTB obtain user information such as the visited Web site or user location and they pay for serving ads. In other words, user data flows are strictly related to financial flows. User data is thus monetized. We expose an interesting design characteristic of RTB which allows us to monitor a channel with winning bids -- dynamically established fees bidders pay for displaying their ads. We perform a detailed measurement of RTB and study how this price for user information varies according to such aspects like time of day, user location and type of visited Web site. Using data obtained from real users, we also study the effect of user profiles. Users are indeed treated differently, based on their previously visited Web sites (browsing history). We observed variability in prices of RTB ads, based on those traits. The price for user information in RTB is volatile and typically is in the range of 0.0001-0.001. This study also had a decidedly important transparency part. We introduced a Web browser extension allowing to discover the price that bidders in RTB pay. This demonstrates how the user awareness could be improved. In part three, we continue the transparency trail. We point out that Web browsers allow every Web site (or third-party resources they include) to record the mouse movements of their visitors. We point out that recent advances in mouse movement analysis points to the notion that mouse movements can potentially be used to recognize and track Web users across the Web; mouse movement analysis can also be used to infer users' demographics data such as age. We highlight the existence of mouse movement analytics -- third-party scripts specializing in mouse movement collections. We also suggest that Web browser vendors should consider including permissions for accessing the API enabling these kind of recordings

    Data Harvesting 2.0: from the Visible to the Invisible Web

    Get PDF
    Personal data are fuelling a fast emerging industry which transform them into added value. Harvesting these data is therefore of the outermost importance for the economy. In this paper, we study the flows of personal data at a global level, and distinguish countries based on their capacity to harvest data. We establish a cartography of international data channels on the visible and invisible Web. The visible Web is composed of the sites that are available to the general public and are typically indexed by search engines. The invisible Web refers to tags, Web bugs, pixels and beacons that appear on Websites to track and profile users. It is well known that the US dominate the visible Web with more than 70 % of the top 100 sites in the world. We show that this domination is even stronger on the invisible Web.The largest proportion of trackers in most countries are indeed from the US. Apart from the US, two countries exhibit an original strategy. China, which dominates its visible Web with a majority of local sites, but surprisingly these sites still contain a majority of US trackers. Russia, which also dominates its visible Web, and is the only country with more local trackers than US ones. 1

    I'm 2.8% Neanderthal - The Beginning of Genetic Exhibitionism?

    No full text
    International audienceDirect-to-consumer genetic testing is gaining popularity. How- ever, the sensitive nature of personal genomic sequencing re- sults might not be fully understood by the general public. In this paper we study the examples of disclosure of this sen- sitive information on social networks. We found that Twit- ter users often post their results publicly. We observed that information on ethnic background is much more frequently released than other information, for example relating to dis- ease risk. This data could be of potential value to entities such as insurance companies.We found that about 24% of the analyzed tweets that men- tioned ethnicity results also contained percentage data. In cases of users disclosing more details of their ethnic back- ground, we found about 96% of these profiles also included identifying information and consequently can be attributed to individuals.As a result, external entities such as insurance companies can gain an insight in the genetic test results and in the end the users could be subject to genetic discrimination

    Selling Off Privacy at Auction

    Get PDF
    International audienceReal-Time Bidding (RTB) and Cookie Matching (CM) are transforming the advertising landscape to an extremely dynamic market and make targeted advertising considerably permissive. The emergence of these technologies allows companies to exchange user data as a product and therefore raises important concerns from privacy perspectives. In this paper, we perform a privacy analysis of CM and RTB and quantify the leakage of users’ browsing histories due to these mechanisms. We study this problem on a corpus of users’ Web histories, and show that using these technologies, certain companies can significantly improve their tracking and profiling capabilities. We detect 41 companies serving ads via RTB and over 125 using Cookie Matching. We show that 91% of users in our dataset were affected by CM and in certain cases, 27% of users’ histories could be leaked to 3rd-party companies through RTB.We expose a design characteristic of RTB systems to observe the prices which advertisers pay for serving ads to Web users. We leverage this feature and provide important insights into these prices by analyzing different user profiles and visiting contexts. Our study shows the variation of prices according to context information including visiting site, time and user’s physical location. We experimentally confirm that users with known history are evaluated higher than new comers, that some user profiles are more valuable than others, and that users’ intents, such as looking for a commercial product, are sold at higher prices than users’ browsing histories. In addition, we show that there is a huge gap between users’ perception of the value of their personal information and its actual value on the market. A recent study by Carrascal et al. showed that, on average, users evaluate the price of the disclosure of their presence on a Web site to EUR 7. We show that user’s browsing history elements are routinely being sold off for less than $0.0005
    corecore